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Abstract 

In this paper, we address low-rank matrix completion problems with fixed ba- 
sis coefllcients, which include the low-rank correlation matrix completion in various 
fields such as the financial market and the low-rank density matrix completion from 
the quantum state tomography. For this class of problems, the efficiency of the com- 
mon nuclear norm penalized estimator for recovery may be challenged. Here, with 
a reasonable initial estimator, we propose a rank-corrected procedure to generate 
an estimator of high accuracy and low rank. For this new estimator, we estab- 
lish a non-asymptotic recovery error bound and analyze the impact of adding the 
rank-correction term on improving the recoverability. We also provide necessary 
and sufficient conditions for rank consistency in the sense of Bach [3^, in which 
the concept of constraint nondegeneracy in matrix optimization plays an important 
role. As a byproduct, our results provide a theoretical foundation for the majorized 
penalty method of Gao and Sun f25' and Gao [24] for structured low-rank matrix 
optimization problems. 

Keywords: matrix completion, fixed basis coefficients, low-rank, convex optimiza- 
tion, rank consistency, constraint nondegeneracy. 



1 Introduction 

The low-rank matrix completion is referred to recover an unknown low-rank matrix, 
exactly or approximately, from the under-sampled observations with or without noises. 
This problem is of considerable interest in many application areas, from machine learning 
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to quantum state tomography. A basic idea to address a low-rank matrix completion 
problem is to minimize the rank of a matrix subject to certain constraints involving 
observations. Given that the direct minimization of rank function is generally NP-hard, 
a widely-used convex relaxation approach is to replace the rank function with the nuclear 
norm as the latter is the convex envelope of the rank function over a unit ball of the 
spectral norm [19j. 

Nuclear norm minimization (NNM) has been observed to provide a low-rank solution 
in practice for a long time (see, e.g., |5H \50\ I19|). The first theoretical characterization 
for the minimum rank solution of the NNM was given by Recht, Fazel and Parrilo [60], 
with the help of the concept of Restricted Isometric Property (RIP). Recognizing that 
the matrix completion problem does not obey the RIP, Candes and Recht [8j introduced 
the concept of incoherence property and proved that most low-rank matrices can be 
exactly recovered from a surprisingly small number of noiseless observations of randomly 
sampled entries via the NNM. The bound of the number of sampled entries was later 
improved to be near-optimal by Candes and Tao |9j through a counting argument. Such a 
bound was also obtained by Keshavan et al. |34| for their proposed OptSpace algorithm. 
Later, Gross [28] sharpened the bound by employing a novel technique from quantum 
information theory developed in [29], with the extension of noiseless observations of 
entries to coefficients relative to any basis. This technique was also adapted by Recht [59]. 
All the above results focus on noiseless matrix completion. The matrix completion with 
noise was first addressed by Candes and Plan j7]. More recently, nuclear norm penalized 
estimators for matrix completion with noise have been well studied by Koltchinskii, 
Lounici and Tsybakov |41) . Negahban and Wainwright |54j, and Klopp |37|. Besides the 
nuclear norm, several other penalties for matrix completion have also been studied in 

[Ml Eg sa [661123]. 

The NNM has been demonstrated to be a successful approach to encourage a low- 
rank solution in many situations. However, the efficiency of the NNM may be challenged 
under general sampling schemes. For example, the conditions characterized by Bach 
for rank consistency of the nuclear norm penalized least squares estimator may not be 
satisfied. In particular, for matrix completion problems, Salakhutdinov and Srebro |65) 
showed that when certain rows and/or columns are sampled with high probability, the 
NNM may fail in the sense that the number of observations required for recovery is much 
more than the setting of most matrix completion problems. Negahban and Wainwright 
|54) also pointed out the impact of such heavy sampling schemes on the recovery error 
bound. As a remedy for this, a weighted nuclear norm (trace norm), based on row- and 
column-marginals of the sampling distribution, was suggested in |54^ l65| |22] if the prior 
information on sampling distribution is available. 

When the true matrix possesses a symmetric/Hermitian positive semidefinite struc- 
ture, the impact of general sampling schemes on the recoverability of the NNM is more 
remarkable. In this situation, the nuclear norm reduces to the trace and thus only de- 
pends on diagonal entries rather than all entries as the rank function does. As a result, 
if diagonal entries are heavily sampled, the ability of the NNM to promote a low-rank 
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solution, as well as the recover ability, will be highly weakened. This phenomenon is fully 
reflected in the widely-used correlation matrix completion problem, for which the nuclear 
norm becomes a constant and completely loses effectiveness for matrix recovery. Another 
example of particular interest in quantum state tomography is to recover a density matrix 
of a quantum system from Pauli measurements (see, e.g., [29l [211 [70]). ^ density matrix 
is a Hermitian positive semidefinite matrix of trace one. Obviously, if the constraints 
of positive semidefiniteness and trace one are simultaneously imposed on the NNM, the 
nuclear norm completely fails in promoting a low-rank solution. Thus, one of the two 
constraints has to be abandoned in the NNM and then be restored in the post-processing 
stage. In fact, this idea has been explored in [29^ i21j and the numerical results there 
indicated its relative efficiency though it is at best sub-optimal. 

In this paper, with a strong motivation to optimally address the difficulties in correla- 
tion and density matrix completion problems, we propose a low-rank matrix completion 
model with fixed basis coefficients. In our setting, for any given basis of the matrix 
space, a few basis coefficients of the true matrix are assumed to be fixed due to a certain 
structure or some prior information, and the rest are allowed to be observed with noises 
under general sampling schemes. Certainly, one can apply the nuclear norm penalized 
technique to our model. The challenge is that, as argued earlier, this may not yield 
a desired low-rank solution with small estimation errors. Here, we introduce a rank- 
correction step to address this critical issue provided that a reasonable initial estimator 
is available. A satisfactory choice of the initial estimator is the nuclear norm penalized 
estimator or one of its analogies. The rank-correction step solves a convex "nuclear norm 
— rank-correction term -|- proximal term" regularized least squares problem with fixed 
basis coefficients (and the possible positive semidefinite constraint). The rank-correction 
term is a linear term constructed from the initial estimator, and the proximal term is a 
quadratic term added to ensure the boundness of the solution to the convex problem. The 
resulting convex matrix optimization problem can be solved by the efficient algorithms 
recently developed in |3H l32| [33] even for large-scale cases. 

The idea of using a two-stage or even multi-stage procedure is not brand new for 
dealing with sparse recovery in the statistical and machine learning literature. The h- 
norm penalized least squares method, also known as the Lasso fB7], is very attractive and 
popular for variable selection in statistics, thanks to the invention of the fast and efficient 
LARS algorithm |12| . On the other hand, the ^i-norm penalty has long been known 
by statisticians to yield biased estimators and cannot attain the estimation optimality 
|14| 118]. The issue of bias can be overcome by nonconvex penalization methods, see, e.g., 
|43 | [T3 l [73] . A multi-stage procedure naturally occurs if the nonconvex problem obtained 
is solved by an iterative algorithm |77]- In particular, once a good initial estimator is 
used, a two-stage estimator is enough to achieve the desired asymptotic efficiency, e.g., 
the adaptive Lasso proposed by Zou |76| . There are also a number of important papers 
in this line on variable selection, including jlHl [Ml [711 EQI [TSl [Ml [15], to name only 
a few. For a broad overview, the interested readers are referred to the recent survey 
papers |16| 117) . It is natural to extend the ideas from the vector case to the matrix 
case. Recently, Bach f3] made an important step in extending the adaptive Lasso of Zou 
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[76] to the matrix case for seeking rank consistency under general sampling schemes. 
However, it is not clear how to apply Bach's idea to our matrix completion model with 
fixed basis coefficients since the required rate of convergence of the initial estimator for 
achieving asymptotic properties is no longer valid as far as we can see. More critically, 
there are numerical difficulties in efficiently solving the resulting optimization problems. 
Such difficulties also occur when the reweighted nuclear norm proposed by Mohan and 
Fazel [S_3J is applied to the rectangular matrix completion problems. 

The rank-correction step to be proposed in this paper is for the purpose to overcome 
the above difficulties. This approach is inspired by the majorized penalty method recently 
proposed by Gao and Sun [25] for solving structured matrix optimization problems with a 
low-rank constraint. For our proposed rank-correction step, we provide a non-asymptotic 
recovery error bound in the Frobenius norm, following a similar argument adopted by 
Klopp in |37j . The obtained error bound indicates that adding the rank-correction term 
could help to substantially improve the recoverability. As the estimator is expected 
to be of low-rank, we also study the asymptotic property — rank consistency in the 
sense of Bach j^, under the setting that the matrix size is assumed to be fixed. This 
setting may not be ideal for analyzing asymptotic properties for matrix completion, but 
it does allow us to take the crucial first step to gain insights into the limitation of the 
nuclear norm penalization. Among others, the concept of constraint nondegeneracy for 
conic optimization problem plays a key role in our analysis. Interestingly, our results of 
recovery error bound and rank consistency suggest a consistent criterion for constructing 
a suitable rank-correction function. In particular, for the correlation and density matrix 
completion problems, we prove that the rank consistency automatically holds for a broad 
selection of rank-correction functions. To achieve better performance for recovery, the 
rank-correction step may be iteratively used for several times, especially when the sample 
ratio is relatively low. Finally, we remark that our results can also be used to provide a 
theoretical foundation for the majorized penalty method of Gao and Sun p5] and Gao 
[24] for structured low-rank matrix optimization problems. 

This paper is organized as follows. In Section[2| we introduce the observation model of 
matrix completion with fixed basis coefficients and the formulation of the rank-correction 
step. In Section [3j we establish a non-asymptotic recovery error bound and discuss 
the impact of the rank-correction term on recovery. Section [3] provides necessary and 
sufficient conditions for rank consistency. Section |5] is devoted to the construction of 
the rank-correction function. In Section [6] we report numerical results to validate the 
efficiency of our proposed rank-corrected procedure. We conclude this paper in Section 
[7] All proofs are left in the Appendix. 

Notation. Here we provide a brief summary of the notation used in this paper. 

• Let M"-i^"2 g^jjf^ (j^nix?i2 (denote the space of all ni x n2 real and complex matrices, 
respectively. Let denote the set of all n x n real symmetric (positive 

semidefinite, positive definite) matrices and (7^", ^"_|-) denote the set of all 
nxn Hermitian (positive semidefinite, positive definite) matrices. Let S" (S" , S"_|_) 
represent 5" (5", 5"+) for the real case and 'H"' {W]_, ^++) complex case. 
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• Let V'l^^^ represent M"i^"^ c^xna^ yn ^gg^^g „ min(rai,n2) for 
the previous two cases and stipulate ni = = n for the latter two cases. Let 
Ynixn2 |-,g endowed with the trace inner product (•, •) and its induced norm || • '^p, 
i.e., (X,y) := Re(Tr(X'^y)) for X, F G v^ixna^ ^^ere "Tr" stands for the trace 
of a matrix and "Re" means the real part of a complex number. 

• For the real case, O"^*^ denotes the set of all n x real matrices with ort honor mal 
columns, and for the complex case, O"^*^ denotes the set of all x fc complex 
matrices with orthonormal columns. When k = n, we write O"^*^ as O" for short. 

• The notation ^ denotes the transpose for the real case and the conjugate transpose 
for the complex case. The notation * means the adjoint of operator. 

• For any given vector x, Diag(x) denotes a rectangular diagonal matrix of suitable 
size with the z-th diagonal entry being Xi. 

• For any x G M", let ||a;||2 and ||a;||oo denote the Euclidean norm and the maximum 
norm, respectively. For any X G V"ix"2^ a^^^j H-'^ll* denote the spectral 
norm and the nuclear norm, respectively. 

• The notations A and -4 mean almost sure convergence, convergence in proba- 
bility and convergence in distribution, respectively. We write Xm = Op{l) if Xm is 
bounded in probability. 

• For any set K, let \K\ denote the cardinality of K and let 5k (x) denote the indicator 
function of K, i.e., 5k{x) = x E K, and Sxix) = +oo otherwise. Let denote 
the n X n identity matrix. 



2 Problem formulation 

In this section, we formulate the model of the matrix completion problem with fixed basis 
coefficients, and then propose a rank-correction step for solving this class of problems. 

2.1 The observation model 

Let {Qi, . . . be a given orthonormal basis of the given real inner product space 
Ynixn2 Then, any matrix X G V"ix"2 gg^j-i be uniquely expressed in the form of 
X = X^fc=i(0fe, ^)0fc, where (Ofc,X) is called the basis coefficient of X relative to 
Gfe. Let X G V"ix"2 |-,g ^Yie unknown low-rank matrix to be recovered. In some practical 
applications, for example, the correlation and density matrix completion, a few basis 
coefficients of the unknown matrix X are fixed (or assumed to be fixed) due to a certain 
structure or reliable prior information. Throughout this paper, we let a C {1, 2, . . . , d} 
denote the set of the indices relative to which the basis coefficients are fixed, and /3 denote 
the complement of q in {1, 2, ... , d}, i.e., a fl /3 = and a U /3 = {1, . . . , d}. We define 
di := \a\ and ^2 := 
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When a few basis coefficients are fixed, one only needs to observe the rest for re- 
covering the unknown matrix X. Assume that we are given a collection of m noisy 
observations of the basis coefficients relative to {Gfc : /c G /?} in the following form 

yi = {Quj,,X) i = l,...,m, (1) 

where Ui are the indices randomly sampled from the index set /?, are the independent 
and identically distributed (i.i.d.) noises with E{(,i) = and E{^f) = 1, and > 
controls the magnitude of noise. Unless otherwise stated, we assume a general weighted 
sampling (with replacement) scheme with the sampling distributions of as follows. 

Assumption 1 The indices coi, . . . ,u}rn i.i.d. copies of a random variable to that has 
a probability distribution H over {1, . . . ,d} defined by 



Pr(a; = k) 



if k £ a, 

Pk>0 if k £ p. 



Note that each Qk,k G f3 is assumed to be sampled with a positive probability in this 
sampling scheme. In particular, when the sampling probability of all k G f3 are equal, 
i.e., = l/d2 VA; G /3, we say that the observations are sampled uniformly at random. 

Next, we present some examples of low-rank matrix completion problems in the above 
settings. 

(1) Correlation matrix completion. A correlation matrix is an n x n real symmetric 

or Hermitian positive semidefinite matrix with all diagonal entries being ones. Let 
Cj be the vector with the i-th entry being one and the others being zeros. Then, 
{eieJ,X) =Xii = l\/l<i<n. The recovery of a correlation matrix is based 
on the observations of entries. For the real case, V"i^"2 = 5", d = n{n + l)/2, 
di = n, 

©a = {ejcj I 1 < i < n} and ©^ = | -^(ejcj CjeJ) 1 < i < j < n| ; 
and for the complex case, V"-i^"-2 — ^ _ ^^2^ = n, 

©a ={ejef I 1 < i < n} and 6/? = |-^(eieJ-F ej-ej), ^^^2 (cjej- Cj-eJ) « < jj . 

Here, represents the imaginary unit. Of course, one may fix some off-diagonal 
entries in specific applications. 

(2) Density matrix completion. A density matrix of dimension n = 2' for some 

positive integer Hs an n x n Hermitian positive semidefinite matrix with trace one. 
In quantum state tomography, one aims to recover a density matrix from Pauli 
measurements (observations of the coefficients relative to the Pauli basis) |29| |2T] , 
given by 

e« = l^^nl andG/3 = \^{as^®---(^asi) (si,...,Si) G {0, 1, 2, 3}' I \©a. 
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where "(JD" means the Kronecker product of two matrices and 

1 0\ /O 1\ / -x^\ fl 



are the Pauh matrices. In this setting, V"!''"^ = -^n^ Ti(X) = {In,X) = l,d = n^, 
and di = 1. 

(3) Rectangular matrix completion. Assume that a few entries of a rectangular 

matrix are known and let X be the index set of these entries. One aims to recover 

this rectangular matrix from the observations of the rest entries. In the real case, 
ymxna = M^ixna^ ^ ^ ^^^^^ ^ 

= {ae] I G X} and = {e^ej | ^ X}; 
and in the complex case, V"iX"2 = C"ixn2^ ^ ^ 2nin2, di = 2|X|, 

©a = {eiej,V^eieJ \ (i, j) G X} and 9/? = {eiej,V^eiej \ (ij) ^ X}. 

Now we introduce some linear operators that are frequently used in the subsequent 
sections. For any given index set vr C say a and /3, we define the linear 

operators 7^^: V"iX"2 and V^: ¥"1x^2 ^ ymxna^ respectively, by 

n^{X) := {{ek,X))l^^ and V^iX) := ^{Qk,X)Qk, X G V"^^"^ (2) 

It is easy to see that Vn = T^^T^-n-. Define the self-adjoint operators : V"ixra2 
Ynixn2 ^j^j Qt . Ynixn2 _s. Ynixn2 associated with the Sampling probability, respectively, 
by 

Q^(X) :=^Pfc(efc,X)efc and qJ3(X) := ^ -(9^, X)G,,, XgV"^^"^ (3) 
fce/3 fce/3 

One may easily verify that the operators Q^, Qj^ and "P^ satisfy the following relations 

Q^Ql = QlQp = rp, VpQp = QpVp = Qp, Q}pK = 0. (4) 

Let n be the multiset of all the sampled indices from the index set /3, i.e., = {wi, . . . , oJm}- 
With a slight abuse on notation, we define the sampling operator IZq,: V"^^"^ — t- M™" 
associated with 17 by 

7^o(X) := ((9^,, X), . . . , (9^,„,X))^ X G V"ix-^ 

Then, the observation model ([T]) can be expressed in the following vector form 

y = nn{X) + yi, (5) 

where y = (yi, . . . , ym)^ G IS™ and C = (Ci? • • • ; ^m)''^ G IS™ denote the observation vector 
and the noise vector, respectively. 
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2.2 The rank-correction step 

In many situations, the nuclear norm is able to encourage a low-rank solution for matrix 
recovery, but its efficiency may be challenged if the observations are sampled at random 
obeying a general distribution such as the one considered in [65j. The setting of fixed 
basis coefficients in our matrix completion model can be regarded to be under an extreme 
sampling scheme. In particular, for the correlation and density matrix completion, the 
nuclear norm completely loses its efficiency since in this case it reduces to a constant. 
In order to overcome the shortcomings of the nuclear norm penalization, we propose a 
rank-correction step to generate an estimator in pursuit of a better recovery performance. 

For convenience of discussions, in the rest of this paper, for any given X G V"iX"2^ 
we denote by (t{X) = ((Ti(X), . . . ,cJn(^)) the singular value vector of X arranged in 
the nonincreasing order and define 

0"i'"2(X) := {{U,V) G X 0"2 I X = [/Diag(f7(X))y^}. 

In particular, when V"iX"2 = g"^ ^e denote by \{X) = (Ai(X), . . . , A„(X))^ the eigen- 
value vector of X with |Ai(X)| > . . . > |A,i(X)| and define 

0"(X) := {P G O" I X = PDiag(A(X))P^}. 

Before stating our rank-correction step, we introduce the concept of spectral operator 
associated to a symmetric vector-valued function. A function / : M" — t- M" is said to be 
symmetric if 

f{x) = f{Qx) V signed permutation matrix Q and x G M"', 

where a signed permutation matrix is a real matrix that contains exactly one nonzero 
entry 1 or —1 in each row and column and elsewhere. From this definition, we see that 

fi{x)={) iiXi = {). 

The spectral operator F : V^^^"^ — ?■ V"^^"^ associated with the function / is defined by 

F(X):=t/Diag(/(a(X)))y\ (6) 

where {U, V) G ©"^'"^(X) and X G V"!''"^, From [10, Theorems 3.1 & 3.6], the symme- 
try of / guarantees the well-definiteness of the spectral operator F, and the continuous 
differentiability of / implies the continuous differentiability of F. When V"iX"2 = S"^ 
we have that 

F{X) = PDiag(/(|A(X)|))(PDiag(5(X)))'', 

where P G 0"(X) and s{X) G M" with the i-th component Si{X) = -1 if Xi{X) < 
and Si{X) = 1 otherwise. In particular for the positive semidefinite case, both U and V 
in ^ reduce to P. For more details on spectral operators, the readerS may refer to the 
PhD thesis [TO]. 
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Given a spectral operator F : V"i^"2 _^ Y"-ixn2 g^j^^ g^j^ initial estimator for the 
unknown matrix X, say the nuclear norm penalized least squares estimator or one of its 
analogies, our rank-correction step is to solve the convex optimization problem 

min -^||y-7^^(X)||2 + p^(||X||,-(F(X„),X) + ^||X-X^|||) 

s.t. na{x) = na(x), 

where pm > and 7^ > are the regularization parameters depending on the number 
of observations. The last quadratic proximal term is added to guarantee the boundness 
of the solution to If the function \\X\\^: — {F{Xjn), X) is level-bounded, one may 
simply set jm = 0. Clearly, when F = and = 0, the problem ([7| reduces to the 
nuclear norm penalized least squares problem. In the sequel, we call —{F{Xm),X) the 
rank-correction term. If the true matrix is known to be positive semidefinite, we add the 
constraint X £ to ([T]). Thus, the rank-correction step is to solve the convex conic 
optimization problem 



s.t. UaiX) = TZa(X), X G SI- 

For this case, we assume that the initial estimator Xm belongs to S" as the projection 
of any estimator onto S" can approximate the true matrix X better. 

The rank-correction step above is inspired by the majorized penalty approach recently 
proposed by Gao and Sun [25] for solving the rank constrained matrix optimization 
problem: 

mm{h{X) : rank(X) < r}, (9) 

where r > 1, h : V"i^"2 — ). ]^ ig a given continuous function and C G Y"i^"2 jg ^ closed 
convex set. Note that for any X S V"iX"2^ constraint rank(X) < r is equivalent to 



= dr+liX) + ■■■ + dniX) = \\X\U - \\X\ 



where HXH^^) := cri(X) -|- • • • -|- CTriX) denotes the Ky Fan r-norm. The central idea of 
the majorized penalty approach is to solve the following penalized version of (§: 

min M^)+p(||X||,-||X||(,)), 

where p > is the penalty parameter. With the current iterate X^, the majorized penalty 
approach yields the next iterate X^^^ by solving the convex optimization problem 

mln h^{X)+p[\\X\U - (G^X) + - X>^\\l), (10) 

where 7^ > 0, G'^ is a subgradient of the convex function at X^, and h'' is a 

convex majorization function of h at X''. Comparing with one may notice that our 
proposed rank-correction step is close to one step of the majorized penalty approach. 
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Due to the structured randomness of matrix completion, we expect that the estimator 
generated from the rank-correction step possesses some favorable properties for recovery. 
The key issue is how to construct the rank-correction function F to make such improve- 
ments possible. In the next two sections, we provide theoretical supports to our proposed 
rank-correction step, from which some important guidelines on the construction of F can 
be captured. 

Henceforth, we let Xm denote the estimator generated from the rank-correction step 
I?! or ([sl for the corresponding cases and let r = rank(X) > 1. For any X G V"iX"2 
and any {U, V) S 0"i'"2(X), we write U = [Ui U2] and V = [Vi V2] with Ui e 0"l^^ 
U2 G oni>c{ni-r)^ £ O'^^x'" and V2 G 0"2x(«2-^). In particular, for any X e 8,% and 
any P G 0"(X), we write P = [Pi P2] with Pi G O"^*" and P2 G O"^^"-'^). 



3 Error bounds 

In this section, we aim to derive a recovery error bound in the Frobenius norm for the 
rank-correction step and discuss the impact of the rank-correction term on the obtained 
bound. The following analysis focuses on the rectangular case. All the results obtained 
in this section are applicable to the positive semidefinite case since adding more prior 
information can only improve recoverability. 

We first introduce the orthogonal decomposition V"ix"2 — ^ j-x with 
T := |X G V"i''"2 \ X = X1+X2 with col(Xi) C col(X), row(X2) C row(X)|, 
T-^ := |x G V"!''"^ I row(X) _L row(X) and col(X) _L col(X)|, 

where row{X) and col(X) denote the row space and column space of the matrix X, 
respectively. Let Vt : V"i^"2 ^ ynixna ^nd Pj-i : V"iX"2 ^ yrnxna ^.j^g orthogonal 
projection operators onto the subspaces T and T"*-, respectively. It is not hard to verify 
that 

Vt{X) = UiU]x + XViVj - UiUlxViVj and P^x {X) = U2UIXV2VI (11) 
for any X G V"i^"2 and {U,V) G 0"i'''2(X). Define and 6„, respectively, by 

am:=\\UiVj-VTiF{X^) + ^^X^))\\ and 6^ := 1 - + 7^X„))||. (12) 

Note that the first term in the objective function of ([T]) can be rewritten as 
^\\y- nn{X)\\l = \\nn{X - X)||^ - - {TZm),X) . 
Using the optimality of X^ to the problem ([7|, we obtain the following result. 
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Theorem 1 Assume that ||7^yx(F(Xm)+ 7m^m)|| < 1- For any given k > 1, if 



Pm> , 

On 



^ T^UO , (13) 



m 



then the following inequality holds: 



1 ll-n /■ V 'v\ l|2 ^ . /Fn/_ , _ II ^ ^11 , Pmlm l\\-^\\2 \\v II 2 



2m" 



nn{X,n-X)\\; <^2r{a^ + ^]p^\\Xra-X\\F + '^^[\\X\\'p-\\X^\\j,]. (14) 



Theorem [ij shows that, to derive an error bound on — we only need to 

establish the relation between \\Xm — XW^p and ^\\R,Q{Xm — ^)||2- It is well known 
that the sampling operator TZq, does not satisfy the RIP, but it has a similar property 
with high probability under certain conditions (see, e.g., |54| W\\ |371 145)). For deriving 
such a property, here, we impose a bound restriction on the true matrix X in the form 
of ||7^^(X)||oo < c. This condition is very mild since a bound is often known in some 
applications such as in the correlation and density matrix completion. Correspondingly, 
we add the bound constraint ||7?.^(X)||oo < c to the problem ([T]) in the rank-correction 
step. Since the feasible set is bounded in this case, we simply set 7^ = and let X'f^ 
denote the estimator generated from the rank-correction step in this case. 

The above boundedness setting is similar to the one adopted by Klopp [37] for the 
nuclear norm penalized least squares estimator. A slight difference is that the upper 
bound is imposed on the basis coefficients of X relative to : A; G /3} rather than all 
the entries of X. It is easy to see that if the bound is not too tight, the estimator is 
the same as Xm- Therefore, we next derive the recovery error bound of X'^ instead of 
Xm, by following Klopp 's arguments in |37j, which are also in line with the work done 
by Negahban and Wainwright j54j. 

Let pi be a constant to control the smallest sampling probability for observations as 

Pk>{pid2r^ ykep. (15) 

It follows from Assumption [T] that pi > 1 and in particular pi = 1 for the uniform 
sampling. Note that the magnitude of pi does not depend on ^2 or the matrix size. By 
the definition of Qp, we then have 

{Q^iA),A)>{pid2)-^\\A\\l VAG{AGV"i><"M7ea(A) = 0}. (16) 

Let {ei, . . . , Cm} be an i.i.d. Rademacher sequence, i.e., an i.i.d. sequence of Bernoulli 
random variables taking the values 1 and —1 with probability 1/2. Define 

i}m:=E -nUe) with e = {ei,...,emV. (17) 
m 

Then, we can obtain a similar result to |37| Lemma 12] by showing that the sampling 
operator TZn satisfies some approximate RIP for the matrices in the following set 

C(r) :=|agV"i^"M 7^„(A) = 0, ||7^^(A)|U = 1, ||A||, < V^||A||i., 

(Q/3(A),A)> 



'641og(ni -I- n2) 



log(2)rn- 
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Lemma 2 For all matrices A € C(r), with probability at least 1— 2/(ni+ 71,2); we have 

-||7^^,(A)||2 > J(Q;3(A), A) - 128^,^d2r^l. 
m 2 

Now, by combining Theorem [T] and Lemma |2] we obtain the following result. 

Theorem 3 Assume that \\V'j^±{F(Xm))\\ < 1 and ||7^^(X)||oo < c for some constant c. 
If Pm is chosen to satisfy (13), then there exists a numerical constant C such that 

i 



<Cmax|/.,d2r^(a^ + -j Pm + ^^^^j^ 

with probability at least 1 — 2/(ni+ 71-2). 



log(ni+n2) 



m 



In order to choose a parameter such that ( 13 ) holds, we need to estimate || i^^R-q (0 II ■ 
For this purpose, we make the following assumption on the noises. 

Assumption 2 The i.i.d. noise variables are sub-exponential, i.e., there exist positive 
constants ci, C2 and C3 such that for all t > 0, Pr(|^i| > t) < ci exp(— C2t'^^). 

The noncommutative Bernstein inequality is a useful tool for the study of matrix 
completion problems. It provides bounds of the probability that the sum of random 
matrices deviates from its mean in the operator norm (see, e.g., |59 | I68 | [28]). Recently, the 
noncommutative Bernstein inequality was extended by replacing bounds of the operator 
norm of matrices with bounds of the Orlicz norms (see |40| I41|). Given any s > 1, the 
■0s Orlicz norm of a random variable 6 is defined by 



:=inf{i>0| Eexp(|6l|7t'') < 2}. 



The Orlicz norms are useful to characterize the tail behavior of random variables. The 
following noncommutative Bernstein inequality is taken from |39s Corollary 2.1]. 



, G V"i^"2 5g independent random matrices with mean 



Proposition 4 Let Z\,...,Z,, 
zero. Suppose that max 1 1| ||Zj|| || , , 2E2 (||Zj|p)} < Wg for some constant Wg. Define 



az '■= max 



i=l 



-1 '"' 1/2 -1 '"■ 



i=l 



1/2' 



Then, there exists a constant C such that for allt > 0, with probability at least 1— exp(— t), 

^^'^ t + log(ni+ 71-2) 1 



m ^-^ 



< C max < az 



t + log(ni + 712) 



, lo, 



m 



crz 



m 



12 



It is known that a random variable is sub-exponential if and only its tpi Orlicz norm 
is finite |52) . To apply the noncommutative Bernstein inequality, we let ^2 be a constant 
such that 



max < 



< 



n 



(18) 



Notice that since Tr( ^^^^^^0^0^) = Tr( ^^^^^^0^0^) = 1, the lower bound of the 
term on the left-hand side is 1/n. This implies that fi2 ^ 1- In the following, we also 
assume that the magnitude of fi2 does not depend on the matrix size. For example, 
//2 = 1 for the correlation matrix completion under uniform sampling and the density 
matrix completion described in Section [2] The following result extends |4H Lemma 2] 
and |37| Lemmas 5 &: 6] from the standard basis to an arbitrary orthonormal basis. A 
similar result can also be found in |54| Lemma 6]. 

Lemma 5 Under Assumption^ there exists a constant C* (only depending on the ipi 
Orlicz norm of S^/.) such that for all t > 0, with probability at least 1 — exp(— t), 



1 



m 



< C* max • 



/^2(i + log(ni +n2)) log(n)(i + log(ni + n2)) 



mn 



m 



(19) 



In particular, when m > nlog'^(ni -|- n2)l {12, we also have 

E 



m 



< C* 



2e/i2 log(ni -I- 712) 



mn 



(20) 



Since Bernoulli random variables are sub-exponential, the right-hand side of (20) 



provides an upper bound of 1?^ defined by (17). Now, we choose t = log(ni-|-n2) in Lemma 
[sjfor achieving an optimal order bound. With this choice, when m > Inlog^ {ni+n2) / fJ.2, 
the first term in the maximum of ( 19 ) dominates the second term. Hence, for any given 
At > 1, by choosing 



/ 2jU2 log(?^i +^2) 
bm. V mn 



(21) 



from Theorem |3] and Lemma |5] we obtain the following main result for recovery error 
bound. 

Theorem 6 Assume that \\Vj^±{F{Xm))\\< 1, ||'^/3(-'^)||oo ^ c for some constant c, and 
Assumption^ holds. For any given k > 1, if pm is chosen according to {21), then there 
exists a numerical constant C such that, when m > nlog^(ni-l- n2)/ ^2, 



< L max • 



1 + K- 



K-l 



1 + - 



pllJL2d2r\og{ni+n2) 



mn 



2 \og{ni+n2) 
m 



(22) 



with probability at least 1 — 3/(?7-i-|- 112). 
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When the matrix size is large, the second term in the maximum of (22) is neg- 
hgible, compared with the first term. Thus, Theorem [6] indicates that for any rank- 
correction function such that ||'Pyx(i^(Xm))|| < 1, one only needs samples with size of 
order d2rlog{ni + n2)/n to control the recovery error. Note that ^2 is of order nin2 
in general. Hence, the order of sample size needed is roughly the degree of freedom of 
a rank r matrix up to a logarithmic factor in the matrix size. In addition, it is very 
interesting to notice that the value of k (or the value of pm) has a substantial influence 



on the recovery error bound. The first term in the maximum of (22) is a sum of two 
parts related to u and c, respectively. The part related to v will increase as k increases 
provided am/bm > 0, while the part related to c will slightly decreases to its limit as k 
increases. 

Theorem [6] also reveals the impact of the rank-correction term on recovery error. 
Note that the value of am/bm fully depends on the rank-correction function F when an 
initial estimator Xm is given. A smaller value of am/bm, brings a smaller error bound 
and potentially leads to a smaller recovery error for the rank-correction step. Note that 
for any given ei > and < £2 < 1, we have 

^ < 7^ if \\VT{F{Xm)) -UiVJW < ei and \\VT^{F{Xm))\\ < £2- 

Om 1—62 

In particular, if F = 0, then the estimator of the rank-correction step reduces to the 
nuclear norm penalized least squares estimator with am/bm = 1- Thus, Theorem |6] 
shows that, with a suitable rank-correction function F, the estimator generated from the 
rank-correction step for recovery is very likely to perform better than the nuclear norm 
penalized least squares estimator. In addition, this observation also provides us clues on 
how to construct a good rank-correction function, to be discussed in Section [5] 



4 Rank consistency 

In this section we study the asymptotic behavior of the rank of the estimator Xm for 
both the rectangular case and the positive semidefinite case. Theorem [6] shows that under 
mild conditions, the distribution of Xm becomes more and more concentrated to the true 
matrix X. Due to the low-rank structure of X, we expect that the estimator Xm has the 
same low-rank property as X. For this purpose, we consider the rank consistency in the 
sense of Bach [3j under the setting that the matrix size is fixed. 

Definition 1 An estimator Xm of the true matrix X is said to be rank consistent if 

lim Pr(rank(Xm) = rank(X)) = 1. 

Throughout this section we make the following assumptions: 
Assumption 3 The spectral operator F is continuous at X. 

Assumption 4 The initial estimator Xm satisfies Xm X as m —t- 00. 
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In addition, we also need the following properties of the operator TZq and its adjoint TZq. 

Lemma 7 (i) For any given X G V"^^"^, the random matrix — TZnTZn{X) "4' Q«(X). 

^ m 

(ii) The random vector —=lZayjplZ}i{^) -4 A^(0, Diag(p)), where p = {pi, . . . ,Pd)'^ ■ 



Epi-convergence in distribution is useful in proving the convergence in distribution of 
minimizers or em-minimizers. The following epi-convergence result is taken from |38j . 

Proposition 8 Let {$m} be a sequence of random lower-semicontinuous functions that 
epi-converges in distribution to <5. Assume that 

(i) Xm is an em-minimizer of ^m, i-^., ^m{xm) < inf <I>m(a^) + Em, where Sm 0; 

(ii) Xm = Op{l); 

(iii) the function $ has a unique minimizer x. 

Then, Xm x. In addition, if ^ is a deterministic function, then Xm — ^ x. 

We know fFrom |27) that Xm is guaranteed to be Op{l) when all are convex func- 
tions and <I> has a unique minimizer. For more details on epi-convergence in distribution, 
one may refer to King and Wets [35], Geyer |26j . Pflug [51)] and Knight |38) . In order 
to apply the epi-convergence theorem to a constrained optimization problem, we need to 
transform the constrained optimization problem into an unconstrained one by using the 
indicator function of the feasible set. This leads to the epi-convergence issue of the sum 
of two sequences of functions. Thus, we need the following epi-convergence result stated 
in |56| Lemma 1]. 

Proposition 9 Let {$m} be a sequence of random lower-semicontinuous functions and 
{'^m} be a sequence of deterministic lower-semicontinuous functions. If either of the 
following two assumptions holds: 

(i) epi-converges in distribution to $ and converges to ^ with respect to the 
topology of uniform convergence on compact sets; 

(ii) converges in distribution to <I> with respect to the topology of uniform convergence 
on compact sets and epi-converges to ^, 

then ^m + ^ 

epi-converges in distribution to ^ -\- 

Based on the above epi-convergence results, we can analyze the asymptotic behavior 
of optimal solutions of a sequence of constrained optimization problems. The following 
result is a direct consequence of the above epi-convergence theorems and Lemma [7] 

Theorem 10 // pm — )• and 7^ = Op{l), then Xm X as m ^ 00. 
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Then, according to Theorem 10 and lower semi-continuity of the rank function, it is 
straightforward to obtain: 

Corollary 11 If X^^X, then hm Pr(rank(X„) > rank(X)) = 1. 

In what follows, we focus on the characterization of necessary and sufficient conditions 
for rank consistency of X^- The idea is similar to that of [3] for the nuclear norm 
penalized least squares estimator. Note that, unlike for the recover error bound, adding 
more constraints may break the rank consistency. Therefore, we separate the discussion 
into the rectangular case and the positive semidefinite case below. 

4.1 The rectangular case 

— P 

Since we have established that Xm X, we only need to focus on some neighborhood 
of X for the discussion about the rank consistency of X„i- First, we take a look at a 
local property of the rank function via the directional derivative of the singular value 
functions. 

Let CT-(X; •) denote the directional derivative function of the i-th largest singular value 
function ai{-) at X. From [HI Section 5.1] and [TT, Proposition 6], for V"i^"2 g _^ q, 

a^{X + H)-a^{X)-a',{X■,H) = 0{\\H\\l), i = l,...,n. (23) 
Recall that r = rank(X). From |1H Proposition 6], we have 

cj;+i(X;if) = \\ulHV2l H G V"!^'^^ 

This leads to the following result for the perturbation of the rank function. A similar 
result can also be found in [3, Proposition 18], whose proof is more involved. 

Lemma 12 Let A G V"iX"2 satisfy V2 / 0. Then, for all p ^ sufficiently small 
and A sufficiently close to A, rank(X + pA) > rank(X). 

To guarantee the efficiency of the rank-correction term on encouraging a low-rank 
solution, the parameter pm should not decay too fast. Define := p^{Xm ~ X)- 
Then, for a slow decay on pm, we can establish the following result. 



4 

the unique optimal solution to the following convex optimization problem 



Proposition 13 // pm — ^ 0, y/mpm — )• 00 and 7^ = Op{l), then A^n — ^ A, where A is 



min \{Qp{A),A) + {UiVl - F{X),A) + \\uIaV2\U 
AeV"i>^"2 2 (24) 

S.t. 7^a(A) = 0. 



Note that X^ = X + pmAm- From Corollary 11, Lemma 12 and Proposition 



13 



we 



see that the condition U2AV2 = is necessary for the rank consistency of Xm- From 



the following property of the unique solution A to (24), we can derive a more detailed 



necessary condition for rank consistency as stated in Theorem 15 below 
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Lemma 14 Let A he the optimal solution to (24) ■ Then = if and only if the 

linear system 



U"^Ql{U2Vvl)V2 = uIqI{UiVI-F{X))V2 
has a solution T G v("i-'")x("-2-'") yj^ifi ||p|| < \ Moreover, in this case, 

A = qJj (Uaf Vl - UiVl + F{X)) . 



(25) 



(26) 



Theorem 15 // pm — 0, y/mpm — ^ 00 and 7^ = Op{l), then a necessary condi- 
tion for the rank consistency of Xm is that the linear system (25) has a solution T G 

Y{ni-r)x{n2-r) ^^^f^ ||f || < i_ 



By making a slight modification for the necessary condition in Theorem 15 we provide 



a sufficient condition for the rank consistency of the estimator Xm as follows. 



Theorem If Pm — ^ 0, \/mpm — ^ c« and 7^ = Op(l), then a sufficient condition for 
the rank consistency of the estimator Xm is that the linear system (25) has a unique 
solution f G v("i"'^)^("2--'~) with llf 11 < 1. 



4.2 The positive semidefinite case 

For the positive semidefinite case, we first need the following Slater condition. 
Assumption 5 There exists some X^ G such that Tla{X^) = Tla{X). 

Proposition 17 // pm — >■ 0, \/rnpm — >• 00 and jm = Op{l), then Am A, where A is 
the unique optimal solution to the following convex optimization problem 



min hQ^{A),A) + {I„-F{X),A) 

A£s" Z 

s.t. TZa{A) = 0, P2 e SI-''. 



(27) 



For the optimal solution A to (27), we also have the following further characterization. 



Lemma 18 Let A be the optimal solution to (27). Then P2AP2 = if and only if the 
linear system 

(28) 



pIqUP2ApI)P2 = pIqUIu - F{X))P2 



has a solution A G S" . Moreover, in this case, 

A = Ql{P2APl-In + F(X)). 



(29) 



Note that Lemma 
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still holds for the positive semidefinite case if C/2 is replaced 

we have the 



by P2AP2. Therefore, in line with the rectangular case, from Lemma 
following necessary condition for rank consistency. 
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Theorem 19 // pm — ?■ 0, \/mpm — >■ oo and = Op{l), then a necessary condition for 
the rank consistency of Xm is that the linear system (28) has a solution A S S""''. 



Analogies to Theorem [TBj we have the following sufficient condition for rank consistency 
for the positive semidefinite case. 



Theorem 20 // pm — ^ 0, \/mpm — ^ oo and 7m = Op(l), then a sufficient condition for 



the rank consistency of Xm is that the linear system ( 28 ) has a unique solution A G S 



571— r 
5'++ • 



4.3 Constraint nondegeneracy and rank consistency 

In this subsection, with the help of constraint nondegeneracy, we provide conditions to 



guarantee that the linear systems (25) and (28) have a unique solution. The concept of 



constraint nondegeneracy was pioneered by Robinson [61] and later extensively developed 
by Bonnans and Shapiro [5j. Consider the following constrained optimization problem 



mm 



n i^{X) + ^{X) : A{X)-b£ k}, 



(30) 



where <I> : V"i^"2 — ^ M is a continuously differentiable function, ^' : V"^^"^ — ^ M is a 
convex function, A : V"^^"'^ — t- is a linear operator, 6 G is a given vector and 
X C is a closed convex set. Let X be a given feasible point of (30 ) and z := A{X) — b. 



When ^ is differentiable at X, we say that the constraint nondegeneracy holds at X if 

^ynixna ^ l[ri{TK{z)) = R\ (31) 

where Tk{z) denotes the tangent cone oi K at z and lin(Tft'(?)) denotes the largest 
linearity space contained in Txiz), i.e., lin(Tft'(z)) = Tk{z) n (— Tft-(z)). When the 



function ^ is nondifferentiable, we can rewrite the optimization problem (30) equivalently 
as 



mm 



{$(A:) + t: ^(A:,t) G iv: X epi^}. 



where epi^f := {(X, t) G V"i><"2 ^ R | ^[X) <t} denotes the epigraph of ^' and A : 



Ynixn2 ^ 



X V"i^"2 X M is a linear operator defined by 

(X,t)GV"i^"2 



IA{X) 
A{X,t) := X 



t 



From (31) and |63| Theorem 6.41], the constraint nondegeneracy holds at {X,t) with 
t = ^'(X) if 



^™ixn2^ 



lm{TKiX)) 



YraiXn2 



lm{%pi^{X,t)) J 

By the definition of A, it is not difficult to verify that this condition is equivalent to 

[A 0]{lm{%pMX,t})) +M'Tk{X)) =R'. (32) 
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By letting ^' = || • ||^,, ^ = IZa and K= {0}, one can see that the problem ([T]) takes the 
form of (30). By the expression of 7^pi^(X,t) with t = \\X\\i, (e.g., see [31j). we see that 
for the problem ([7|, the condition ( [32] ) reduces to 



where 



n^{T{x)) 



T{X) = {H £ V"i^"2 I uIhV2 = 0}. 



(33) 



(34) 



Hence, we say that the constraint nondegeneracy holds at X to the problem ([7|) if the 
condition J33| holds. By letting ^' = (^§n,^ = TZ^ and K = {0}, we can see that the 
problem ([8| takes the form of ( pO| ) , and now that the condition ( 32 ) reduces to 

pdi 



7^,(lin(r§n(x))) 



(35) 



Thus, we say that the constraint nondegeneracy holds at X to the problem ([8j) if the 
condition (35) holds. From Arnold's characterization of the tangent cone 7i^(X) = 

G S" I P2HP2 G in |i2j, we can write the linearity space lin(7i^(X)) explicitly 



as 



iin(r§" (X)) = G I pIhp2 = 0}. 



Interestingly, for some special matrix completion problems, the constraint nondegen- 
eracy automatically hold at X, as stated in the following proposition. 

Proposition 21 For the following matrix completion problems: 

(i) the covariance matrix completion with partial positive diagonal entries being fixed , 

in particular, the correlation matrix completion with all diagonal entries being fixed 
as ones; 

(ii) the density matrix completion with its trace being fixed as one, 



the constraint nondegeneracy (35) holds at X. 



Next, we take a closer look at the solutions to the linear systems (25 ) and (28 ). Define 

linear operators Bi : V^^ V("l-r)x(n2-r) . y(ni-r)x(n2-Vr_^ v{ni-r)x(n2-r) 

associated with X, respectively, by 



Bi{Y):=uIqI{UiYV^)V2 and B2{Z) := uIqI{U2ZV2)V2, 



(36) 



where Y e V^^*" and Z G v("i"'")^("2-r)^ ^^^^ ^i^^^ operator B2 is self-adjoint and 
positive semidefinite according to the definition of Q^. Let g{X) be the vector in 
defined by 



giX) := {l-h{a{X)),...,l-fr{a{X))y 



(37) 



Then, by the definition of the spectral operator F, we can rewrite (25) in the following 
concise form 

B2{T) = Bi{Bmg{g(X))), r G V^^^^'^^^^"!-'^). (38) 
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S" and X G S!L , both [/,• and Vj reduce to 



For the positive semidefinite case V^i^^-a 
Pi for i = 1,2. In this case, the hnear system (28) can be concisely written as 

B2{A) = ^2(/n-r) +^i(Diag(?(X))), A e S"-'-. 



(39) 



Proposition 22 For the rectangular case, if the constraint nondegeneracy (33) holds at 
X to the problem then the linear operators B2 defined by (36) is self-adjoint and 
positive definite. For the positive semidefinite case, if the constraint nondegeneracy (35) 



holds at X to the problem then the linear operators B2 is also self-adjoint and positive 
definite. 



According to Proposition 



22 
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and 
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we can obtain the following main 



the constraint nondegeneracy at X to the problem 
Q and ([8|, respectively, implies that the linear system (25) has a unique solution 
r = (Diag(^(X))) and the linear system (28) has a unique solution A = In-r + 

(Diag(^(X))) . Then, from Theorems 
result for rank consistency. 

Theorem 23 Suppose that pm — 0, \fmpm — >• 00 and jm = Op{l). For the rectangular 
case, if the constraint nondegeneracy 1^33^ holds at X to the problem ^ and 

\\B,'Bi{Diag{g(X)))\\<l, (40) 

then the estimator Xm generated from the rank- correction step ^ i s rank consistent. 
For the positive semidefinite case, if the constraint nondegeneracy (35) holds at X to the 
problem ^ and 

In-r + B^^Bi {Dmg{gr(X))) G S^7, (41) 
then the estimator X^n generated from the rank- correction step (^l) is rank consistent. 



From Theorem [231 it is not difficult to see that there exists some threshold e > 



(depending on X) such that the condition (40) holds if |1 — fi{(T{X))\ < e VI < 

1 < r. In other words, when F{X) is sufficiently close to UiVi, the condition (40) 
holds automatically and so does the rank consistency. Thus, Theorem |23| provides us a 
guideline to construct a suitable rank-correction function for rank consistency. This is 
another important aspect of what we can benefit from the rank-correction step, besides 
the reduction of recovery error discussed in Section |3j 

The next theorem shows that for the covariance (correlation) and density matrix com- 
pletion problems with fixed basis coefficients described in Proposition 21 if observations 
are sampled uniformly at random, the rank consistency can be guaranteed for a broad 
class of rank-correction functions F. 

Theorem 24 For the covariance (correlation) and density matrix completion problems 
defined in Proposition 21 under uniform sampling, if pm — >• 0, ^Jmpm — 00, 7^ = Op(l) 
and F is a spectral operator associated with a symmetric function f : M"' — )• M" such that 
fori = l,...,n, 

fi{x) >0 Vx G M" and fi{x) = if and only if Xi = 0, (42) 



then the estimator Xm generated from the rank- correction step is rank consistent. 



20 



5 Construction of the rank-correction function 



In this section, we focus on the construction of a suitable rank-correction function F 
based on the results obtained in Sections [3] and |4] As can be seen from Theorem [6j a 
smaller value of am/bm potentially leads to a smaller recovery error. Thus, we desire a 
construction of the rank-correction function such that F{Xm) is close to UiV\. Mean- 
while, according to Theorem 23 we also desire that F{X) is close to UiVi for rank 
consistency. Notice that a reasonable initial estimator Xm should not deviate too much 
from the true matrix X. Therefore, the above two criteria consistently suggest a natural 
idea to construct a rank-correction function F, if possible, such that 

F{X) UiVl as X^X. (43) 

Next, we proceed the construction of the rank-correction function F for the rectangular 
case. For the positive semidefinite case, one may just replace the singular value decom- 
position with the eigenvalue decomposition and conduct exactly the same analysis. 

5.1 The rank is known 

If the rank of the true matrix X is known in advance, we construct the rank-correction 
function F by 

F{X) := UiV^, (44) 



where {U,V) G 0"i'"2(X) and X G V"i^"2, Note that F defined by (|44|) is not a 
spectral operator over the whole space of V"^^"^^ in a neighborhood of X it is 
indeed a spectral operator and is actually twice continuously differentiable (see, e.g., |1H 



Proposition 8]). Hence, it satisfies the criterion (43). With this rank-correction function 



the rank-correction step is essentially the same as one step of the majorized penalty 



method developed in |25| . By Theorem 16 and Proposition 22, we immediately obtain 
the following result. 

Corollary 25 Suppose that the rank of the true matrix X is known and the constraint 
nondegeneracy holds at X. If pm — >• 0, y/mpm oo, 7^ = Op(l) and F is chosen by 



{44K then the estimator Xm generated from the rank- correction step is rank consistent. 



5.2 The rank is unknown 

If the rank of the true matrix X is unknown, then the rank-correction function F cannot 



be defined by ( 44 ) . What we will do is to construct a spectral operator F to imitate the 



case when the rank is known. Here, we propose F to be a spectral operator 

F(X) :=C/Diag(/(a(X)))y^ (45) 
associated with the symmetric function / : — )■ defined by 

: . , T , „ „ , if 2; gM"\{0}, , , 

fr{x)={ \M\ooJ ^' (46) 

if X = 0, 
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where {U,V) G ©''^■"^(X), X G ¥"^^"2^ and the scalar function </> : M ^ M takes the 
form 

:=sgn(t)(l + e^) t ^ M, (47) 

for some r > and e > 0. By noting that for each t, (j){t) — )• sgn(t) as e 4- Oi directly 
obtain the following result. 



m 



Corollary 26 Suppose that the constraint nondegeneracy holds at X . If Pm — ^ 0, \pmp. 
oo, 7m = Op{lY then for any given r > 0, there exists some e > such that for any 



F defined by (45), (46) and (41) with Q < e < e, the estimator Xm generated from the 



rank- correction step is rank consistent. 



Corollary 26 indicates that one needs to choose a small e > in pursuit of rank 
consistency. Meanwhile, we also need to take care of the influence of a small e > on the 
recovery error bound which depends on the value of am/bm- Certainly, we desire am ~ 
and 6m ~ 1- This motivates us to choose a function 0, if possible, such that 

'cTi{Xm)\ fl if 1 < i < rank(X), 



~ , - , - (48) 

ai{Xm)J [O if rank(X) + 1 < i < n. 



This is also why we normalize the function <j) defined by (47) in the interval t £ [0, 1] such 



that 4>{0) = and = 1. However, as indicated by Corollary 11 the initial estimator 
Xfn is very possible to have a higher rank than X when it approaches to X. It turns 
out that when e > is tiny, (p(^ai{Xm) / (^i{X„i)) ~ 1 for rank(X) + 1 < i < rank(Xm), 



which violates our desired property (48). As a result, e > should be chosen to be small 
but balanced. Notice that 4'{e) = (1 + e^)/2 « 1/2 if e > is small and r > is not 
too small. Thus, the value of £ can be regarded as a divide of confidence on whether 
ai{Xm) is believed to come from a nonzero singular values of X with perturbation — 
positive confidence if cjj(Xm) > £ai{Xm) and negative confidence if cJi(Xm) < £ai{Xm)- 
On the other hand, the parameter r > mainly controls the shape of the function (p over 
t £ [0, 1]. The function (j) is concave if < r < 1 and S'-shaped with a single inflection 
point at if r > 1. Moreover, the steepness of the function (f) increases when 

r increases. In particular, if < e < 1 and r is very large, (j) is very close to the step 
function taking the value if < t < e and the value 1 if e < t < 1. In this case. 



there exists some e such that the desired property (48) can be achieved and that the 
corresponding rank-correction function F is very close to the one defined by (44). Thus, 
it seems to be a good idea to choose an 5-shaped function (p with a large r. However, 
in practice, the parameter e should be pre-determined. Since rank(X) is unknown and 
the singular values of Xm are unpredictable, it is hard to choose a suitable e in advance, 
and hence, it will be too risky to choose a large r for recovery. As a result, one has to be 
somewhat conservative to choose r, sacrificing some optimality of recovery in exchange 
for robustness strategically. If the initial estimator is generated from the nuclear norm 
penalized least squares problem, we recommend the choices r = 1 or 2 and e = 0.01 ~ 0.1 
as these choices show stable performance for plenty of problems, as validated in Section 
6. 
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We also remark that for the positive semidefinite case, the rank-correction function 
defined by (45 ), (46 ) and (47) is related to the reweighted trace norm for the matrix rank 
minimization proposed by Fazel et al. |20| |53] . The reweighted trace norm in |20| |53| for 
the positive semidefinite case is {{X^ + e/„)~^,X), which arises from the derivative of 
the surrogate function logdet(X + of the rank function at an iterate ^ where e is 
a small positive constant. Meanwhile, in our proposed rank-correction step, if we choose 
T = 1, then /„ — -^j^F{Xm) = e'{Xm + £'In)~^ with e' = e\\Xm\\- Superficially, similarity 
occurs; however, it is notable that e' depends on Xm, which is different from the constant 
e in [20ll53j. More broadly speaking, the rank-correction function F defined by (45), (46) 
and ( |47| ) is not a gradient of any real-valued function. This distinguishes our proposed 
rank-correction step from the reweighted trace norm minimization in |20| [53] even for 
the positive semidefinite case. 



6 Numerical experiments 

In this section, we validate the power of our proposed rank-corrected procedure on the 
recovery by applying it to the positive semidefinite matrix completion problems. In 
solving the optimization problem in the rank-correction step ([8|, we adopted the code 
developed by Jiang et al. |3I] for large scale linearly constrained convex semidefinite 
programming problems. The implemented code is based on an inexact version of the 
accelerated proximal gradient method \55\ All tests were run in MATLAB under 
Windows 7.0 operating system on an Intel Core(TM) 17-2720 QM 2.20GHz CPU with 
8.00GB memory. 
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For convenience, in the sequel, the NNPLS estimator and the RCS estimator, respec- 
tively, stand for the estimators from the nuclear norm penalized least squares problem 
(i.e., the problem ([s]) with F = and 7^ = 0) and the rank-correction step. Let be 
an estimator. The relative error (relerr for short) of is defined by 

, 1 1 Xm — X\\p 

relerr 



max(10-8, \\X\\f) 

6.1 Influence of flxed basis coeflicients on the recovery 

In this subsection, we take the correlation matrix completion for example to test the 
performance of the NNPLS estimator and the RCS estimator with different patterns 
of fixed basis coefficients. We randomly generated the true matrix X by the following 
command: 

M = randn(n,r); ML = weight*M( : , 1 :k) ; M(:,l:k) = ML; Xtemp = M*M' ; 
D = diag(l./sqrt(diag(Xtenip))) ; X_bar = D*Gtenip*D 

where the parameter weight is used to control the relative magnitude difference between 
the first k largest eigenvalues and the other nonzero eigenvalues. In our experiment, we 
set weight = 5 and k = 1, and took X = X_bar with dimension n = 1000 and rank r 
= 5. We randomly fixed partial diagonal and off-diagonal entries of X and sampled the 
rest entries uniformly at random with i.i.d. Gaussian noise at the noise level 10%. 

In Figure |2| we plot the curves of the relative error and the rank of the NNPLS 
estimator and the RCS estimator with different patterns of fixed entries. In the captions 
of the subfigures, diag means the number of fixed diagonal entries and non-diag means 
the number of fixed off-diagonal entries. The subfigures on the left-hand side and the 
right-hand side show the performance of the NNPLS estimator and the RCS estimator. 



respectively. For the RCS estimator, the rank-correction function F is defined by (45) 



(46) and (47) with r = 2 and e = 0.02, and the initial X^ is chosen from those points of 
the corresponding subfigures on the left-hand side such that 1 1 1 y — TZn {Xm )||2/||y||2 — 0.l| 
attains the smallest value. 

From the subfigures on the left-hand side, we observe that as the number of fixed 
diagonal entries increases, the parameter pm for the smallest recovery error deviates more 
and more from the one for attaining the true rank. In particular, when diag = n, the 
NNPLS estimator reduces to the (constrained) least squares estimator so that one cannot 
benefit from the NNPLS estimator for encouraging a low-rank solution. This implies that 
the NNPLS estimator does not possess the rank consistency when some entries are fixed. 
However, the subfigures on the right-hand side indicate that the RCS estimator can yield 
a solution with the correct rank as well as a desired small recovery error simultaneously, 
with the parameter pm in a large interval. This exactly validates the theoretical result 



of Theorem 24 for rank consistency. 
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Figure 2: Influence of fixed basis coefficients on recovery (sample ratio = 6.38%) 



6.2 Performance of different rank-correction functions for recovery 



In this subsection, we test the performance of different rank-correction functions for 
recovering a correlation matrix. We randomly generated the true matrix X by the 
command in Subsection 6.1 with n = 1000, r = 10, weight = 2 and k = 5. We fixed all 
the diagonal entries of X and sampled partial off-diagonal entries uniformly at random 
with i.i.d. Gaussian noise at the noise level 10%. We chose the (nuclear norm penalized) 
least squares estimator to be the initial estimator X^- In Figure [s] we plot four curves 
corresponding to the rank-correction functions F defined by (|45|), mm and (47) with 



T = 2 and different e, and another two curves corresponding to the rank-correction 
functions F defined by (44) at X^ (i-e., UiVi) and X (i.e., UiVi)^ respectively. The 
values of a^, &m and the optimal recovery error with different p„i are listed in Table [T| 

As can be seen from FigurelS) when pm increases, the recovery error decreases with the 

T 

rank and then increases after the correct rank is attained, except for the case UiVi- This 
validates our discussion about the recovery error at the end of Section [3] Moreover, for a 
smaller e, the curve of recovery error changes more gently, though a certain optimality in 
the sense of recovery error is sacrificed. This means that the choice of a relatively small 
e, say 0.01 or 0.02, is more robust for those ill-conditioned problems. From Table [T] we 
see that a smaller am/bm corresponds to a better optimal recovery error. It is worthwhile 
to point out that, even if am/bm is larger than 1, the performance of the RCS estimator 
for recovery is still much better than that of the NNPLS estimator. 



Table 1: Influence of the rank-correction term on the recovery error 



rank-correction function 




bm 


^rn/bm 


optimal relerr 


zero function 


1 


1 


1 


10.85% 


e = 0.01, T = 2 


0.1420 


0.2351 


0.6038 


5.96% 


e = 0.02, T = 2 


0.1459 


0.5514 


0.2646 


5.80% 


£ = 0.05, r = 2 


0.1648 


0.8846 


0.1863 


5.75% 


£ = 0.1, r = 2 


0.2399 


0.9681 


0.2478 


5.77% 


UiV^ (initial) 


0.1445 


0.9815 


0.1472 


5.75% 


UiVl (true) 





1 





2.25% 



6.3 Performance for different matrix completion problems 

In this subsection, we test the performance of the RCS estimator for the covariance and 
density matrix completion problems. As can be seen from Figure [2| a good choice of 
the parameter pm for the RCS estimator could be the smallest one such that the rank 
becomes stable. Such a parameter pm can be found by the bisection search method. This 
is actually what we benefit from rank consistency. In the following numerical experiments, 
we apply the above strategy to find a suitable pm for the RCS estimator, and choose the 
rank-correction function F defined by (45), (46) and (47) with r = 2 and e = 0.02. 
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Figure 3: Influence of the rank-correction term on the recovery 
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We first take the covariance matrix completion for example to test the performance of 
the RCS estimator with different initial estimators Xm- The true matrix X is generated 
by the command in Subsection 6.1 with n = 500, r = 5, weight = 3 and k = 1 except that 
D = eye(n). We depict the numerical results in Figure 4, where the dash curves represent 
the relative recovery error and the rank of the NNPLS estimator with different pm, and 
the solid curves represent the relative recovery error and the rank of the RCS estimator 
with Xm chosen to be the corresponding NNPLS estimator. As can be seen from Figure 
4, the RCS estimator substantially improves the quality of the NNPLS estimator in terms 
of both the recovery error and the rank. We also observe that when the initial Xm has a 
large deviation from the true matrix, the quality of the RCS estimator may still not be 
satisfied. Thus, it is natural to ask whether further rank-correction steps could improve 
the quality. The answer can be found from Table |2] below, where the numerical results 
of the covariance matrix completion are reported. We also report the numerical results 
of the density matrix completion in Table [3] 



Covariance matrix completion (n = 500, ranl< = 5, noise level = 10%, sample ratio = 6.37%, nfix_diag=n/5, nfix offdiag = n/5) 
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Figure 4: Performance of the RCS estimator with different initial X„ 



For the covariance matrix completion problems, we generated the true matrix X by 
the command in Subsection 6.1 with n = 1000, weight = 3 and k = 1 except that D = 
eye (n) . The rank of X and the number of fixed diagonal and non-diagonal entries of X 
are reported in the first and the second columns of Table [2] respectively. We sampled 
partial off-diagonal entries uniformly at random with i.i.d. Gaussian noise at the noise 
level 10%. The first RCS estimator is using the NNPLS estimator as the initial estimator 
and the second (third) RCS estimator is using the first (second) RCS estimator as 
the initial estimator Xm- From Table [i] we see that when the sample ratio is reasonable, 
one rank-correction step is enough to yield a desired result. Meanwhile, when the sample 
ratio is very low, especially if some off-diagonal entries are further fixed, one or two more 
rank-correction steps can still improve the quality of estimation. 
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Table 2: Performance for covariance matrix completion problems with n = 1000 





diag/ 


sample 


NNPLS 


1st RCS 


2st RCS 


3rd RCS 


r 


off-diag 


ratio 


relerr (rank) 


relerr (rank) 


relerr (rank) 


relerr (rank) 




1000/0 


2.40% 


1.95e-l (47) 


1.27e-l (5) 


1.18e-l (5) 


1.12e-l (5) 




1000/0 


7.99% 


6.10e-2 (51) 


3.41e-2 (5) 


3.37e-2 (5) 


3.36e-2 (5) 


5 


500/50 


2.39% 


2.01e-l (45) 


l.lOe-1 (5) 


9.47e-2 (5) 


8.97e-2 (5) 




500/50 


7.98% 


7.19e-2 (32) 


3.77e-2 (5) 


3.59e-2 (5) 


3.58e-2 (5) 




1000/0 


5.38% 


1.32e-l (74) 


7.68e-2 (10) 


7.39e-2 (10) 


7.36e-2 (10) 


10 


1000/0 


8.96% 


9.18e-2 (78) 


5.15e-2 (10) 


5.08e-2 (10) 


5.08e-2 (10) 


500/100 


5.37% 


1.58e-l (57) 


8.66e-2 (10) 


7.74e-2 (10) 


7.60e-2 (10) 




500/100 


8.96% 


1.02e-l (49) 


5.36e-2 (10) 


5.24e-2 (10) 


5.25e-2 (10) 



Table 3: Performance for density matrix completion problems with n = 1024 



noise 


r 


noise 
level 


sample 
ratio 


NNPLS 1 


NNPLS2 


RCS 


fidelity relerr rank 


fidelity relerr rank 


fidelity relerr rank 


statistical 


3 


10.0% 
10.0% 


1.5% 
4.0% 


0.697 2.59e-l 3 
0.915 8.04C-2 3 


0.955 2.50e-l 3 
0.997 6.84e-2 3 


0.987 1.02e-l 3 
0.998 4.13C-2 3 


5 


10.0% 
10.0% 


2.0% 
5.0% 


0.550 3.71e-l 5 
0.889 1.03e-l 5 


0.908 4.23e-l 5 
0.995 9.18e-2 5 


0.972 1.61e-l 5 
0.997 4.91e-2 5 


mixed 


3 


12.4% 
12.4% 


1.5% 
4.0% 


0.654 2.93e-l 3 
0.832 1.49e-l 3 


0.957 2.43e-l 3 
0.995 8.14e-2 3 


0.988 1.06e-l 3 
0.997 6.41e-2 3 


5 


12.4% 
12.5% 


2.0% 
5.0% 


0.521 3.95e-l 5 
0.817 1.61e-l 5 


0.912 4.09e-l 5 
0.987 l.Ole-1 5 


0.977 1.51e-l 5 
0.996 7.09e-2 5 
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For the density matrix completion problems, we generated the true density matrix X 
by the following command: 

M = randn(n,r)+i*randn(n,r) ; ML = weight*M( : , 1 :k) ; M(:,l:k) = ML; 
Xtemp = M*M'; X_bar = Xteinp/suin(diag( (Xtemp) ) ) . 

During the testing, we set n = 1024, weight = 2 and k = 1, and sampled partial Pauli 
measurements except the trace of X uniformly at random with i.i.d. Gaussian noise at 
the noise level 10%. Besides the above statistical noise, we further added the depolarizing 
noise, which frequently appears in quantum systems, with strength 0.01. This case is 
labeled as the mixed noise in the last four rows of Table [3l We remark here that the 
depolarizing noise differs from our assumption on noise since it does not have randomness. 
One may refer to |29| 121] for details of the quantum depolarizing channel. In Table [3j 
the (squared) fidelity is a measure of the closeness of two quantum states, defined by 

the NNPLSl estimator means the NNPLS estimator by dropping the trace 
one constraint, and the NNPLS2 estimator means the one obtained by normalizing the 
NNPLSl estimator to be of trace one. Note that the NNPLS2 estimator was ever used 
by Flammia et al. |2T|. Table|3]shows that the RCS estimator is superior to the NNPLS2 
estimator in terms of both the fidelity and the relative error. 



7 Conclusions 

In this paper, we proposed a rank-corrected procedure for low-rank matrix completion 
problems with fixed basis coefficients. This approach can substantially overcome the 
limitation of the nuclear norm penalization for recovering a low-rank matrix. We studied 
the impact of adding the rank-correction term on both the reduction of the recovery 
error bounds and the rank consistency (in the sense of Bach |3] ) . Due to the presence of 
fixed basis coefficients, constraint nondegeneracy plays an important role in our analysis. 
Extensive numerical experiments show that our approach can significantly improve the 
recovery performance in the sense of both the recovery error and the rank, compared 
with the nuclear norm penalized least square estimator. As a byproduct, our results also 
provide a theoretical foundation for the majorized penalty method of Gao and Sun |25) 
and Gao [24j for structured low-rank matrix optimization problems. 

Our proposed rank-correction step also allows additional constraints according to 
other possible prior information. In particular, for additional linear constraints, all the 
theoretical results in this paper hold with slight modifications. In order to better fit the 
under-sampling setting of matrix completion, in the future work, it would be of great 
interest to extend the asymptotic rank consistency results to the case that the matrix 
size is allowed to grow. It would also be interesting to extend this approach to deal with 
other low-rank matrix problems. 
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Appendix 

Proof of Theorem [T] 

Let Am '■= X-in — X. Since Xm is optimal to fin and X is feasible to i7\, it follows that 



-^||7^n(A„)||^ < (-nUC),Aj) - Pm{\\Xm\\* - \\X\U - {F{Xm) + imXm,Am)) 
2m \m / 



+ '-^{\\XfF-\\^m\\l). 



(49) 



Then, it follows from (13) that 



1 



< 



m 

Pm^m 



{\\VTiAm)\U + \\rT4Am)\\*)- 



(50) 



From the directional derivative of the nuclear norm at X (see |71 1 Theorem 1]), we have 

ll^mll* - ll^ll* > {UiV],Am) + \\UlAmV2\U. 



This, together with equations (11) and (12), implies that 

ll-'^mll* — ll-'^ll* ~ {F{Xm) + 'ymXm, Am) 

> {UiVl, Am) + \\UlAmV2\U - {F{Xm) + ImXm, Am) 

= (I7lFi -pT(i^(^m)+7m^m), A„) + llP^Ti (A„)||, - (F(1„) + 7„X„), A„) 

= {UlVl-VT{F{Xm)+lmXm);PT{Am)) + ||7^r^ (A^) ||, - (P^^x {F {Xm) +lmXm) ,Vt1- {Am)) 

> -\\UiVl-VT{F{Xm)+lrnXm)\\\\VT{Am)\U + {l-\\VT^m 

= -am\\VT{Am)\U+hm\\VT^{Am)\U. (51) 



By substituting (51) and (50) into (49), we obtain that 
1 



^J,nn{Amm <Pm[[am + ^] \\Vt{A 



K — 1 

m)\\* Om||' T^\Ar, 



+ 



Pm'Jm, 



(52) 



\X\ 



\X., 



m\\p) 



Note that rank(PT(A„)) < 2r. Hence, ||Pr(A„,)||, < V2r\\VTiAm)\\F < V2r\\Am\\F, 



and the desired result follows from (52). Thus, we complete the proof. 
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Proof of Lemma [2l 

The proof is similar to that of |37| Lemma 12]. We need to show that the event 



£ = <3Ae C(r) such that 



1 



m 



\\nn{A)\\l-{Qp{A),A) 



> ^{Q(s{A),A) + 128fiid2ri9l 



occurs with probabihty less than 2/(ni +n2). For any given e>0, we decompose C(r) as 

oo 

C(r) = U e C(r) I 2^--ie < (Q^(A), A) < I'^e] . 



k=i 



For any a > 0, let C(r,a) := {A G C(r) | {Q^{A),A) < a}. Then we get £ C W^^^Sk 
with 

£k = i3A€ C(r,2''e) such that -||7^f^(A)||2 - (Q;3(A), A) > 2^"-^ + I28fiid2r^l}. 

Then, we need to estimate the probability of each event S^. Define 

1 



Za := sup 

AeC(r,a) 



m 



\\nniA)U-{QpiA),A) 



Notice that for any A G V"!''"^^ 

l||7^o(A)i = l^(e.„ A)2 "4- E((e.,, A)2) = (q^(a), a). 



i=l 



Since ||7^/3(A)||oo < 1 for all A G C(r), from Massart's Hoeffding type concentration 
inequality |47| Theorem 9] for suprema of empirical processes, we have 



Pr {Za > E{Za) +t) < exp(-mfV2) Vt > 0. 



(53) 



Next, we use the standard Rademacher symmetrization in the theory of empirical pro- 
cesses to further derive an upper bound of E(Za). Let {ei, . . . , 6^} be a Rademacher 
sequence. Then, we have 



/ I 1 

E(Z,) = E sup - V(e^,,A)2-E((e^^,A) 
\AeC{r,a^ I ^ - 



m 



< 



( I 1 \ / I 1 

2E sup - Vei(e^,,A)M <8E sup -Vei(e^,,A 

VA6C(r,a)l"i'^ / VAeC(r,a) l"^ ^ 

/ I 1 ™ \ 

8E sup - V(7^Me),A) < 8E 



7^^(e)||( sup ||A||, ), (54) 

AGC(r,a) 



where the first inequality follows from the symmetrization theorem (e.g., see |69| Lemma 
2.3.1] and |6j Theorem 14.3]) and the second inequality follows from the contraction 
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theorem (e.g., see |42| Theorem 4.12] and [6| Theorem 14.4]). Moreover, from (16), we 
have 

||A||* < Vr\\A\\F < Jnird2{Qi3{A),A) < V/xiroiaa VAg C(r,a). (55) 



Combining (54) and (|55j) with the definition of iDm in (17), we obtain that 



Then, by choosing t = a/8 in (53), it follows that 



Pr {Za>^ + 128^ird2i9^) < Pr (z„ > E(Z„) + ^) < exp 



ma 
'T28 



This implies that Pr(ffc) < exp(-4^e2^/128). Then, by choosing e =A/^^^^^fe±^ and 
using > 1 + X > X, we have 



log(2)m 



00 00 / Ak 2 \ ^ 

k=l k=l ^ ^ k=l 

exp(— log(2)me^/64) 



log(4)A;e^m 
128 



< 



1 



1 — exp(— log(2)e2m/64) ni + 722 — 1 
Thus, we complete the proof. 

Proof of Theorem |3] 

The proof is similar to that of |37| Theorem 3]. Let := — X. By noting that 
7m = in this case, from (52), we have 



am + ^) \\rT{A'}^)\U - ^6mrrx(A^)||, > 0. 
Then, by setting tm-= ;^3i(l + f^)? together with the above inequality, we obtain that 

||a;;,I|, < ||Pt(a;;,)||, + h^t^a;;,)!!, < tm\\VT{A-j\u < V2^tm\\A'i^\\F. (56) 

Let Cm '■= ||7^/3(A^)||oo- Clearly, Cm < 2c. We proceed the discussions by two cases: 
Case 1. Suppose that (Q/3(A^J, A^) < cl^\/^^^^^- From jief, we obtain that 



l^mll F 

do 



- log(2)m 



Case 2. Suppose that (Q^(A^),A^) > c^y^ ^^'gg+"^\ Then, from (56), we have 
A^/cm £ C(2t^r). Together with Lemma [2] it follows that 



^(Q;3(A^),A?„) < l||7^n(A^)||2 + 128c^4^l^^2r^?: 
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Combining the last inequality with Theorem [T] and equation (16), we obtain that 



By plugging in tm, we have that there exists some constant Ci such that 



< Ciflid2r Om H p„ + — TT^T^— c -i?^ 



This, together with Case 1, completes the proof. 
Proof of Lemma [Sl 

Recall that = T^ET=l^i®^^■ Let Zi := ^,9^,. Since E(Ci) = 0, the indepen- 

dence of and 0^. implies that E(Zj) = 0. Since ||0tJi||F = 1, we have that 

\\Zi\\ < \\Zi\\p = \^i\\\QuJi\\F = 

It follows that II II Zj II 11^^ < ||^i||i/)i- Thus, ||||Zj||||^^ is finite since S,i is sub-exponential. 
Meanwhile, E5(||Zi||2) < Ei(||Zj|||,) = E^^f) = 1. We also have 

kei3 



The calculation of E(ZjZj) is similar. From (18), we obtain that \/l/n < cr^ < \/ fi2/n. 
Then, applying the noncommutative Bernstein inequality yields (19). The proof of (20) 
is exactly the same as the proof of Lemma 6 in |37] . For simplicity, we omit the proof. 

Proof of Lemma [7] 

(i) From the definition of the sampling operator 1Zq_ and its adjoint 7^^, we have 



1 1 ™ 

m m ^ — ^ 



m m 

1=1 



This is an average value of m i.i.d. random matrices {Q^.,X)Q^-. It is easy to see that 
E((Ga,,,X)e^J = Qf}{X). The result then follows directly fr om the strong law of large 
numbers. 

(ii) From the definition of 7^^ and TZaui3j it is immediate to obtain that 
1 1 / m . ^ m 
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Since E(^j) = and = 1, from the independence of and TZaupi^uiJ, we have 

^{^iT^au/siQuii)) = and cov[^i,Tlaui3{®u}i)) = Pi- Applying the central limit theorem 
then yields the desired result. 

Proof of Theorem 1101 

Let $m denote the objective function of ([7| and denote the feasible set. Then, the 
problem ([7| can be concisely written as 

min ^rn{X) + 6^{X). 

By Assumptions [3] and [4] and Lemma [Tj we have that $m converges pointwise in proba- 
bility to where <^{X) := l\\Qp{X - X)\\j for any X eV^iX'^z, As a direct extension 
of Rockafellar |62| Theorem 10.8], Andersen and Gill jl] Theorem ILl] proved that the 
pointwise convergence in probability of a sequence of random convex function implies 
the uniform convergence in probability on any compact subset. Then, from Proposition 
[9] we obtain that $m + Sjr epi-converges in distribution to $ + (5 jr. Note that X is the 
unique minimizer of ^{X) + 6j^{X) since ^{X) is strongly convex over the feasible set 
J^. Using the convexity of and <I>, we complete the proof by Proposition [8] 

Proof of Lemma 1121 

By replacing X and H in (23) with X and pA, respectively, and noting that ar+i{X) = 0, 

we have ar+i(X + pA) - \\ul{pA)V2\\ = 0^pA|||). Since uIaV2 / 0, for any p / 
sufficiently small and A sufficiently close to A, 



^^:±i(^±^ = ||[/;af,|| + o(|HI|a||^) 
\p\ 

> 111/2 AF2II - ||C72(A- A)F2|| +0(|p|||A||^) (57) 

> ^\\uIav2\\ >o. 

This implies that rank(X + pA) > r. 
Proof of Proposition |13| 

By letting A := p^{X — X) in the optimization problem (24), one can easily see that 
Am is the optimal solution to 

min ^||7^^(A)||2 - -^{nUO,^) + — (||X + p„A||, - ||X||,) 
AeV"iX"2 2m mpm Pm 

- {FiXm), A) + ^ II Alll + jm{X - Xm, A) (58) 

s.t. 7^„(A) = 0. 



Let $m and $ denote the objective functions of (58) and (24), respectively. Let J- denote 



the feasible set of (24). By the definition of directional derivative and [711 Theorem 1], 



lim —{\\X + pmA\U - \\X\U) = {UiVlA) + \\U;AV2\U. 

Om— >0 Pm 



35 



Then, by combining Assumptions |3] and |4] with Lemma [7j we obtain that converges 
pointwise in probabiUty to <I>. By using the same argument as in the proof of Theorem 



10, we obtain that + epi-converges in distribution to <I> + 6j^. Moreover, the 



optimal solution to (24) is unique due to the strong convexity of $ over the feasible set 



J^. Therefore, we complete the proof by applying Proposition |8] on the epi- convergence. 



Proof of Lemma 

— T ^ - 

Assume that = 0. Since A is the optimal solution to (24), from the optimality 

condition, the subdifferential of IIXIL at 0, and I62[ Theorem 23.71, we obtain that there 



(59) 



at 0, and (621 Theorem 23.7], 
exist some P G v("i-'')'<("2-r-) ^^^.j^ yf y < 1 and 77 G M'^i such that 



Q/3(A) + UiVl 



Fix)-nuv) 



U2fvl 



Then, according to (4), we can easily obtain (26) by applying the operator QI to the 



Conversely, if the linear system (25) has a solution T with 



first equation of (59) and using the second equation. By further combining (|26|) and 

— — — ^ J I 

= 0, we obtain that F is a solution to the linear system ([25|). 



to check that (59) is satisfied with A being given by (26) and rj 



II < 1, then it is easy 

na{Uiv1-F(x)- 



1/2^ V 2)- Consequently, = follows directly from the equations (25) and (26). 



Proof of Theorem | 

The estimator is the optimal solution to ([7|) if and only if there exist a subgradient 
Gm of the nuclear norm at and a vector rfm E M^^^ such that {XrmVm) satisfies the 
KKT conditions: 



1 



m 



TZ*^{nn{X^) - y) + P^{G^-F{Xm) + 7m{Xm-Xm)) - nUVm)=0, 



(60) 



na{x^) = na{x). 



Let (C/™,F™) G 0"i'"2(X„) with UmAe 0"lX^ Um,2 G 0"ix("l-'-), Vra,l G 0"2><r ^nd 

Vm,2 G Q"-2>^{n2-r) ^ From Theorem 10 and Corollary [llj we know that rank(Xm) > 
r with probability one. When lank(Xm) > r holds, then from the characterization 
of the subdifferential of the nuclear norm IZH [72], we have that Gm = Um^iVZ.i + 
Um,2TmVZ,2 for some G v(^i-0x(n2-r-) satisfying ||r^|| < 1. Moreover, if ||f^|| < 1, 

then rank{Xfn) = r. Since X^n A X, by [TT\ Proposition 8] we have Um,iVZ.i UiV^. 
Together with Lemma [Tj the equation ([s]) and Lemma 14 it is not hard to obtain that 



-nh{nn{Xm) -y)+ Um,lVZ,i - F{Xm) + -fm{Xm " ^m) 

4 Q^{A) + UiVl - F(X) 



U2rvl, 



(61) 
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where the equahty follows from ( |26. 
by applying the operator Qj^ to (60) 



and r is the unique optimal solution to (25). Then, 
, we obtain from (61) that 



u'^QUUm,2rmVZ.)V2 4 u'^QUu2rv'^)V2. 



(62) 



Since Xm X, according to pT{ Proposition 7], there exist two sequences of matrices 
Qm,u e O"!-'' and Qmy G O"'"'' such that 



— P — P 

Um,2Qm,U — ^ U2 and Vm,2Qm,V " ^ ^2- 



(63) 



Moreover, the uniqueness of the solution to the linear system (25) is equivalent to the 



non-singularity of its linear operator. By combining (62) and (63), we obtain that 



Qm u^"^Qrn,v T. Hence, we obtain that ||rm|| < 1 with probability one since ||r|| < 1. 
As discussed above, it follows that rank(Xm) = r with probability one. 



Proof of Proposition 17 

It is easy to verify that Am is the optimal solution to 



min —\\nn(A)\\ 
AGS" 2m" ^ ^" 



-{TZUO,^) + {In- F{Xm),A) + 



mpm 



^\\f 



+ -fm{X - Xm,A) 



(64) 



s.t. AeTm := Pm\cn§l-x), 



where C := G | TZaiX) = TZa{X)]. Let $m and ^ denote the objective func- 
tions of (|64|) and (27), respectively. Then converges pointwise in probability to 



Moreover, by considering the upper limit and lower limit of the family of feasible sets 
J-'m, we know that converges in the sense of Painleve-Kuratowski to the tangent cone 
7cnS"(^) (see |63| |5]). Note that the Slater condition implies that C and cannot 
be separated. Then, from |63| Theorem 6.42], we have 7cnS!j:(^) = Tci^) ^ 7i^(^). 
Clearly, Tc(X) = {A E S" | 7^a(A) = 0}. Moreover, by Arnold [2], 

r§n (X) = {A G s" I pIap2 g §+"'■}• 

Since epi-convergence of functions corresponds to set convergence of their epigraphs [63], 
we obtain that epi-converges to = Sj-^ + dj-^n ■ Then, from Proposition j9 

+ epi-converges in distribution to $ -|- Sj-^ + 6-j-^„ . In addition, the optima. 



solution to (27) is unique due to the strong convexity of $ over the feasible set C H S!f . 



Therefore, we complete the proof by applying Proposition [8] on the epi-convergence. 



Proof of Lemma 

Note that the Slater condition also holds for the problem (27). (One may check the 
point X^ — X.) Hence, A is the optimal solution to (27) if and only if there exists 



37 



(C, A) G M'^i X S"-^' such that 



Qp{A) + F(X) - nUC) - P2^P2 = 0, 

^a(A) = 0, 

pJaPs G Sr", A G S"-^ (P2AP2, A) = 0. 



(65) 



Applying the operator Qt to the first equation of (65) yields the equality (29). Assume 



that P2A.P2 = 0. Then, it is immediate to obtain from (26) that A is a solution to the 



linear system (28). 



Conversely, if the linear system (28) has a solution A G S!L then it is easy to check 



P2AP2 = directly follows from (29) and the first equation of (65). 



that (65) is satisfied with A given by (29) and C = TZa{ln - F{X) - P2AP2). Then 



Proof of Theorem | 

The Slater condition implies that Xm is the optimal solution to (|8j) if and only if there 
exists multipliers {Cm, Sm) G ^'^^ xS" such that {Xm, Cm, Sm) satisfy the KKT conditions: 



1 

m 



n*^{nn{Xm)-y) + Pm{ln-F{Xm) +7m{Xm-Xm))-K{Cm) - = 0, 

7^„(x„) =7^„(X), (66) 

^ Xm G S+) Sm G 5 {Xm, Sm) = 0. 



The third equation of ( 66 ) implies that Xm and Sm can have a simultaneous eigenvalue 

])nx{n-r)^ From 



decomposition. Let Pm G 
Theorem 



10 



'{Xm) with Pm,l G 



and Pm,2 G 



and Corollary 11, we know that rank(Xm) > r with probability one. When 



rank(Xm) > r holds, we can write Sm = Pm,2AmPm 2 some diagonal matrix A^ G 



In addition, if A^ G §"4!', then rank(X„ 



r. 



im Proposition 1], there exists a sequence of matrices Qm G 
P2- Then, using the similar arguments to the proof of Theorem 16 we obtain that 



Since Xm — ^ X, according to 
such that Pm,2Qm 



Qm^mQm 4 A. Since A G §'[ we have Km G with probability one. Thus, we 
complete the proof. 



Proof of Proposition [21 

For the real covariance matrix case, the proof is given in |57| Lemma 3.3] and |58| Propo- 
sition 2.1]. For the complex covariance matrix case, one can use the similar arguments 
to prove the result. 

We next consider the density matrix case. Suppose that X satisfies the density con- 
straint, i.e., 7^a(X) = Tr(X) = 1. Note that for any t G M, we have tX G lin(7K!^ (X)). 
This, along with Tr(X) = 1, implies that Tr(lin(THn (X))) = 7^Q (lin(THn (X))) = M. 
This means that the constraint nondegeneracy condition (35) holds. 



Proof of Proposition 22 
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We prove for the rectangular case by contradiction. Assume that there exists some 
Y(ni-r)x(n2-r) ^ f / such that B2{T) = 0.1(^2^ vl)V 2 = 0. By noting that 

Qjj is a self-adjoint and positive semidefinite operator, we obtain {Q^p)^^^{U2^v'^) = 0. 
It follows that Vfs{U2^v'^) = 0. This, together with T 0, implies that U2^v'^ = 
'Pa{U2^v'^) 7^ and moreover lZa{U2^v'^) 7^ 0. However, for any H S T{X), we have 



{na{U2Tvl),na{H)) = {ra{U2rvl),H) = {U2TvI,h) = {t,uIhv2) 



0. 



Thus, the constraint nondegeneracy condition (33) implies that Tla{U2^y2) ~ 0- This 
leads to a contradiction. Therefore, the linear operator B2 is positive definite. The proof 
for the positive semidefinite case is similar. 



Proof of Theorem 



From Propositions 21 and 22, for both cases, the linear system (28) has a unique solution 



A. Moreover, uniform sampling results in Qj^ = Vp/d2- Thus, from (28), we get 



A - pIVo,{P21pI)P2 = PlVp{P2lPl)P2 = PlVpiln " F{X))P2. (67) 

We first prove the covariance matrix completion by contradiction. Without loss of 
generality, we assume that the first I diagonal entries are fixed and positive. Then, for 
any X € S" , Va^X) is the diagonal matrix whose first / diagonal entries are Xa, 1 < i < I 
respectively and the other entries are 0. Assume that A ^ i.e., Amin(A) < 0, where 

•^min(') denotes the smallest eigenvalue. Then, we have 

A„,in(A) = A^in(P2APl) < A^in(Pa(P2APl)) < A^i„ (P2 ^„ (P2 A?I)P2) , 



where the equality follows from the fact that A and P2AP2 have the same nonzero 
eigenvalues, the first inequality follows from the fact that the vector of eigenvalues is 
majorized by the vector of diagonal entries, (e.g., see |46| Theorem 9.B.1]), and the second 
inequality follows from the Courant-Fischer minmax theorem, (e.g., see |461 Theorem 



20. A. 1]). As a result, the left-hand side of (67) is not positive definite. Notice that 

f I I 

P2F{X)P2 = 0. Thus, the right-hand side of (67) can be written as 



P^Mln - F{X))P2 = PlVp{In)P2 + pIv o,{F {X)) P 2 = pI{V p{In) + V^{F {X))) P2. 



Since rank(X) = r, with the choice (42) of F, we have that for any 1 < i < /, 



X,, = ^A,(X)|Pi,f >0 implies (F(X)) .. = ^ /, (A,(X)) |P,,f > 0. 

Moreover, Vp{In) is the diagonal matrix with the last n — r diagonal entries being ones 
and the other entries being zeros. Thus, VB{In) +Pa{F{X)) is a diagonal matrix with all 



positive diagonal entries. It follows that the right-hand side of (67) is positive definite 
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we obtain 



Thus, we obtain a contradiction. Hence, A G S"_^''. Then, from Theorem 
the rank consistency. 

For the density matrix completion, Va{-) = ^Tr(-)/„. By further using P2F{X)P2 = 
and Vis{In) = 0, we can rewrite (67) as 

A - -TT(k)In-r = -Tl{F{X))In-r. 

n n 

By taking the trace on both sides, we obtain that A = ^Tr(F(X))/„_r. Since X is a 
density matrix of rank r, with the choice (42) of F, we have that 

n r n r 

Tr(X) = ^^A,(X)|P,,f = 1 imphes Tr{F(X)) = ^ /, (A,(X)) > 0. 

i=l j=l i=l j=l 



It follows that A G S'^J' and thus we obtain the rank consistency. 
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